Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces

نویسندگان

  • Su Yan
  • Alex Hai Wang
  • Dongwon Lee
  • C. Lee Giles
چکیده

Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, we propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular, we project high-dimensional data onto a much reduced-dimension subspace, where rough clustering structure defined by the prior knowledge is strengthened. Metric learning is then performed on the subspace to construct more informative pairwise distances. We also propose to propagate constraints locally to improve the informativeness of pairwise distances. When the new methods are evaluated using two real benchmark data sets, they show substantial improvement using only limited prior knowledge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...

متن کامل

Combining Hashing and Abstraction in Sparse High Dimensional Feature Spaces

With the exponential increase in the number of documents available online, e.g., news articles, weblogs, scientific documents, the development of effective and efficient classification methods is needed. The performance of document classifiers critically depends, among other things, on the choice of the feature representation. The commonly used “bag of words” and n-gram representations can resu...

متن کامل

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

Active Semi-Supervision for Pairwise Constrained Clustering

Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannotlink constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clust...

متن کامل

Constraint Score: A new filter method for feature selection with pairwise constraints

Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009